-
Notifications
You must be signed in to change notification settings - Fork 0
add "is_timeout" stat for search result #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 0.16
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR instruments search routines with an “is_timeout” statistic by extending the Dataset API to carry arbitrary stats, adding a stats JSON to InnerSearchParam, setting the timeout flag during search, and exposing the stats via new Dataset methods. Sequence diagram for search routine timeout stat propagationsequenceDiagram
participant SearchWithRequest
participant InnerSearchParam
participant BasicSearcher_search_impl
participant DatasetImpl
SearchWithRequest->>InnerSearchParam: create and initialize stats
SearchWithRequest->>BasicSearcher_search_impl: pass InnerSearchParam
BasicSearcher_search_impl->>InnerSearchParam: set stats["is_timeout"] = true if timeout
SearchWithRequest->>DatasetImpl: call Stats(stats.dump())
DatasetImpl->>DatasetImpl: store stats_ string
ER diagram for Dataset stats storageerDiagram
DATASET {
string stats_
}
INNER_SEARCH_PARAM {
json stats
}
DATASET ||--o{ INNER_SEARCH_PARAM : "stores stats from"
Class diagram for updated Dataset and DatasetImpl stats APIclassDiagram
class Dataset {
<<interface>>
+GetExtraInfoSize() int64_t
+Stats(stats: std::string) DatasetPtr
+GetStats() std::string
+GetStats(stat_keys: std::vector<std::string>) std::vector<std::string>
}
class DatasetImpl {
+Stats(stats: std::string) DatasetPtr
+GetStats() std::string
+GetStats(stat_keys: std::vector<std::string>) std::vector<std::string>
-stats_ std::string
}
DatasetImpl --|> Dataset
Class diagram for InnerSearchParam stats additionclassDiagram
class InnerSearchParam {
+stats std::shared_ptr<JsonType>
+InnerSearchParam()
+operator=(other: InnerSearchParam) InnerSearchParam&
...
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/dataset_impl.cpp:36` </location>
<code_context>
+std::vector<std::string>
+DatasetImpl::GetStats(const std::vector<std::string>& stat_keys) const {
+ auto json = JsonType::parse(this->stats_);
+ std::vector<std::string> result;
+ for (const auto& key : stat_keys) {
</code_context>
<issue_to_address>
**issue:** Consider error handling for invalid JSON in stats_.
If stats_ is malformed, JsonType::parse may throw or behave unpredictably. Adding validation or error handling would improve robustness, especially if stats_ is externally set.
</issue_to_address>
### Comment 2
<location> `tests/test_index.cpp:2124-2130` </location>
<code_context>
->Owner(false);
auto res = index->KnnSearch(query, 10, search_param);
REQUIRE(res.has_value());
+ auto result = res.value();
+ REQUIRE(result->GetStats() != "{}");
+ auto stats = result->GetStats({"is_timeout"});
+ REQUIRE(stats.size() == 1);
+ bool has_timeout_result = (stats[0] == "true" or stats[0] == "false");
+ REQUIRE(has_timeout_result);
}
}
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases where the stats string is malformed or missing keys.
Adding tests for malformed stats strings, missing keys, or unexpected values will improve GetStats error handling and robustness.
```suggestion
REQUIRE(result->GetStats() != "{}");
auto stats = result->GetStats({"is_timeout"});
REQUIRE(stats.size() == 1);
bool has_timeout_result = (stats[0] == "true" or stats[0] == "false");
REQUIRE(has_timeout_result);
// Edge case: malformed stats string
{
struct MalformedResult {
std::string GetStats() const { return "{is_timeout: true"; } // missing closing brace, not valid JSON
std::vector<std::string> GetStats(const std::vector<std::string>& keys) const {
// Simulate error handling: return empty string for malformed
return std::vector<std::string>(keys.size(), "");
}
} malformed_result;
auto malformed_stats = malformed_result.GetStats({"is_timeout"});
REQUIRE(malformed_stats.size() == 1);
REQUIRE(malformed_stats[0] == "");
}
// Edge case: missing key in stats
{
struct MissingKeyResult {
std::string GetStats() const { return R"({"other_key": "true"})"; }
std::vector<std::string> GetStats(const std::vector<std::string>& keys) const {
// Simulate missing key: return empty string for missing
std::vector<std::string> out;
for (const auto& k : keys) {
out.push_back(k == "other_key" ? "true" : "");
}
return out;
}
} missing_key_result;
auto missing_stats = missing_key_result.GetStats({"is_timeout"});
REQUIRE(missing_stats.size() == 1);
REQUIRE(missing_stats[0] == "");
}
// Edge case: unexpected value
{
struct UnexpectedValueResult {
std::string GetStats() const { return R"({"is_timeout": "maybe"})"; }
std::vector<std::string> GetStats(const std::vector<std::string>& keys) const {
// Simulate unexpected value
std::vector<std::string> out;
for (const auto& k : keys) {
out.push_back(k == "is_timeout" ? "maybe" : "");
}
return out;
}
} unexpected_value_result;
auto unexpected_stats = unexpected_value_result.GetStats({"is_timeout"});
REQUIRE(unexpected_stats.size() == 1);
REQUIRE((unexpected_stats[0] != "true" && unexpected_stats[0] != "false"));
}
}
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
src/dataset_impl.cpp
Outdated
|
|
||
| std::vector<std::string> | ||
| DatasetImpl::GetStats(const std::vector<std::string>& stat_keys) const { | ||
| auto json = JsonType::parse(this->stats_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Consider error handling for invalid JSON in stats_.
If stats_ is malformed, JsonType::parse may throw or behave unpredictably. Adding validation or error handling would improve robustness, especially if stats_ is externally set.
tests/test_index.cpp
Outdated
| REQUIRE(result->GetStats() != "{}"); | ||
| auto stats = result->GetStats({"is_timeout"}); | ||
| REQUIRE(stats.size() == 1); | ||
| bool has_timeout_result = (stats[0] == "true" or stats[0] == "false"); | ||
| REQUIRE(has_timeout_result); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding tests for edge cases where the stats string is malformed or missing keys.
Adding tests for malformed stats strings, missing keys, or unexpected values will improve GetStats error handling and robustness.
| REQUIRE(result->GetStats() != "{}"); | |
| auto stats = result->GetStats({"is_timeout"}); | |
| REQUIRE(stats.size() == 1); | |
| bool has_timeout_result = (stats[0] == "true" or stats[0] == "false"); | |
| REQUIRE(has_timeout_result); | |
| } | |
| } | |
| REQUIRE(result->GetStats() != "{}"); | |
| auto stats = result->GetStats({"is_timeout"}); | |
| REQUIRE(stats.size() == 1); | |
| bool has_timeout_result = (stats[0] == "true" or stats[0] == "false"); | |
| REQUIRE(has_timeout_result); | |
| // Edge case: malformed stats string | |
| { | |
| struct MalformedResult { | |
| std::string GetStats() const { return "{is_timeout: true"; } // missing closing brace, not valid JSON | |
| std::vector<std::string> GetStats(const std::vector<std::string>& keys) const { | |
| // Simulate error handling: return empty string for malformed | |
| return std::vector<std::string>(keys.size(), ""); | |
| } | |
| } malformed_result; | |
| auto malformed_stats = malformed_result.GetStats({"is_timeout"}); | |
| REQUIRE(malformed_stats.size() == 1); | |
| REQUIRE(malformed_stats[0] == ""); | |
| } | |
| // Edge case: missing key in stats | |
| { | |
| struct MissingKeyResult { | |
| std::string GetStats() const { return R"({"other_key": "true"})"; } | |
| std::vector<std::string> GetStats(const std::vector<std::string>& keys) const { | |
| // Simulate missing key: return empty string for missing | |
| std::vector<std::string> out; | |
| for (const auto& k : keys) { | |
| out.push_back(k == "other_key" ? "true" : ""); | |
| } | |
| return out; | |
| } | |
| } missing_key_result; | |
| auto missing_stats = missing_key_result.GetStats({"is_timeout"}); | |
| REQUIRE(missing_stats.size() == 1); | |
| REQUIRE(missing_stats[0] == ""); | |
| } | |
| // Edge case: unexpected value | |
| { | |
| struct UnexpectedValueResult { | |
| std::string GetStats() const { return R"({"is_timeout": "maybe"})"; } | |
| std::vector<std::string> GetStats(const std::vector<std::string>& keys) const { | |
| // Simulate unexpected value | |
| std::vector<std::string> out; | |
| for (const auto& k : keys) { | |
| out.push_back(k == "is_timeout" ? "maybe" : ""); | |
| } | |
| return out; | |
| } | |
| } unexpected_value_result; | |
| auto unexpected_stats = unexpected_value_result.GetStats({"is_timeout"}); | |
| REQUIRE(unexpected_stats.size() == 1); | |
| REQUIRE((unexpected_stats[0] != "true" && unexpected_stats[0] != "false")); | |
| } | |
| } | |
| } |
- implement for hgraph & ivf - introduce new search param: "max_time_cost_ms"(double) - update doc for ivf Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Summary by Sourcery
Add JSON-based statistics support to datasets and search operations to record and retrieve an "is_timeout" flag for search results.
New Features:
Enhancements:
Tests: